Chapter 2: Study Design

Overview

  • 2 Study design
  • 2.1 Sampling principles and strategies
    • 2.1.1 Populations and samples
    • 2.1.2 Parameters and statistics
    • 2.1.3 Anecdotal evidence
    • 2.1.4 Sampling from a population
  • 2.2 Experiments
    • 2.2.1 Principles of experimental design
    • 2.2.2 Reducing bias in human experiments
  • 2.3 Observational studies

2.1 Sampling principles and strategies

2.1.1 Populations and samples

Research Question: What is the average mercury content in swordfish in the Atlantic Ocean?

  • The population of interest is all the swordfish in the Atlantic Ocean.
  • Researchers collect a sample of 60 swordfish and measure their mercury content.

2.1.2 Parameters and statistics

Research Question: What is the average mercury content in swordfish in the Atlantic Ocean?

  • The population of interest is all the swordfish in the Atlantic Ocean.
  • Researchers collect a sample of 60 swordfish and measure their mercury content.
    • They calculate the average mercury content of the swordfish in this sample. This number is a statistic.
  • This statistic is an estimate of a parameter, the true average mercury content of all swordfish in the Atlantic.

Key point: Statistics are calculated from a sample, while parameters are properties of the entire population.

Group Activity

A recent article in a college newspaper stated that college students get an average of 5.5 hours of sleep each night. A student who was skeptical about this value decided to conduct a survey by asking 25 randomly-chosen students about their sleep habits. On average, the surveyed students slept 6.25 hours per night.

  1. What is the population of interest?
  2. What is the sample?
  3. Identify the parameter in this situation.
  4. Identify the corresponding statistic.

2.1.3 Anecdotal evidence

An example of faulty reasoning:

A man on the news got mercury poisoning from eating swordfish, so the average mercury concentration in swordfish must be dangerously high.

  • This fallacy relies on anecdotal evidence.
  • Such evidence may be true and verifiable, but it may only represent extraordinary cases and therefore not be a good representation of the population.
  • Good statistical studies collect a sample of data from a specified population in systematic ways.

2.1.4 Sampling from a population

Research Question: Over the last five years, what is the average time to complete a degree for Duke undergrads?

  • Random selection: Write each graduate’s name on a raffle ticket and draw 10 tickets (or use a computer to generate 10 random names). The selected names would represent a random sample of 10 graduates.
  • Such a sample is called a simple random sample.
  • This is best way to collect a sample, because it avoids sampling bias.
  • Simple random samples are the best way to get sample that is representative of the entire population.

Convenience Samples

Research Question: Over the last five years, what is the average time to complete a degree for Duke undergrads?

  • Suppose that Duke did a poor job keeping track of its graduates, except for the health sciences departments.
  • We might find it convenient to just use the data we can get, rather than taking a true random sample of the whole population.
  • Convenience samples collected in this way often suffer from sampling bias.

Group Discussion

We can easily access ratings for products, sellers, and companies through websites. These ratings are based only on those people who go out of their way to provide a rating. If 50% of online reviews for a product are negative, do you think this means that 50% of buyers are dissatisfied with the product? Why or why not?

Experiments vs. Observational Studies

Last Time: Air Pollution and Preterm Births

  • Response variable: Full-term or preterm
  • Explanatory variable: \(PM_{10}\) level

This observational study found an association (higher \(PM_{10} \Rightarrow\) more preterm births).

  • A confounding variable is a third variable that is associated to both the explanatory variable and the response.
  • Confounding variables may be the true cause of the response. Possibilities:
    • Access to health care?
    • Urban lifestyle choices?

2.2 Experiments

Recall:

  • In an experiment, researchers put subjects in two or more groups and compare them.
  • In a randomized experiment, researchers randomly assign the groups. (e.g., stent study)

2.2.1 Principles of experimental design

Key Point: The only difference between the groups should be the treatment under consideration.

  • Controlling. Researchers do their best to control any other differences in the groups. (e.g., pill vs. placebo, blinding, double-blinding)
  • Randomization. Researchers randomize patients into treatment groups to account for variables that cannot be controlled.
    • Randomization eliminates confounding variables.
  • Replication. Everything is done in a way that others would be able to reproduce. More observations produce more conclusive results.

2.3 Observational Studies

Data where no treatment has been explicitly applied (or explicitly withheld) is called observational data.

Group Discussion: Suppose an observational study tracked sunscreen use and skin cancer, and it was found that the more sunscreen someone used, the more likely the person was to have skin cancer. Does this mean sunscreen causes skin cancer? Can you identify a possible confounding variable?

Group Discussion

In one 2009 study, a team of researchers recruited 76 volunteer subjects and divided them randomly into two groups: treatment or control. One group was given 25 grams of chia seeds twice a day, and the other was given a placebo that looked like chia seeds, but that had no nutritional value. After 12 weeks, the scientists found no significant difference between the groups in appetite or weight loss.

  1. What type of study is this?

  2. What are the experimental and control treatments in this study?

  3. Has blinding been used in this study?

  4. If the researchers had found a significant difference, could they have made a causal statement?

  5. Can we generalize the conclusion to the population at large?